{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PAyVU6IyfFxV"
   },
   "source": [
    "# File I/O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4efDuMFsfLEj"
   },
   "source": [
    "So far we discussed how to process data, how to build, train and test deep learning models. However, at\n",
    "some point we are likely happy with what we obtained and we want to save the results for later use and\n",
    "distribution. Likewise, when running a long training process it is best practice to save intermediate results\n",
    "(checkpointing) to ensure that we don’t lose several days worth of computation when tripping over the\n",
    "power cord of our server. At the same time, we might want to load a pretrained model (e.g. we might\n",
    "have word embeddings for English and use it for our fancy spam classifier). For all of these cases we need\n",
    "to load and store both individual weight vectors and entire models. This section addresses both issues."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9bRyFUUDffXo"
   },
   "source": [
    "# Tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "pKOLpkFAffOF"
   },
   "source": [
    "In its simplest form, we can directly use the save and load functions to store and read tensors\n",
    "separately. This works just as expected.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Zh_5bTwnfO6h"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "x=torch.arange(4,dtype = torch.float32)\n",
    "torch.save(x,\"x-file\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fI9OUXvVjjCP"
   },
   "source": [
    "Then, we read the data from the stored file back into memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "6487HO2WgD_y",
    "outputId": "8e8c6a76-b724-455f-c3e9-93bacf6a5017"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0., 1., 2., 3.])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2 = torch.load(\"x-file\")\n",
    "x2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "sB2lwzw1mvGM"
   },
   "source": [
    "We can also store a list of tensors and read them back into memory.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "FdcWhmNIgyRz",
    "outputId": "38d1709a-68c0-44d3-fbbe-53c35d7fe817"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([0., 1., 2., 3.]), tensor([0., 0., 0., 0.]))"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = torch.zeros(4)\n",
    "torch.save([x, y],'x-files')\n",
    "x2, y2 = torch.load('x-files')\n",
    "(x2, y2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0uDaToApm-Cu"
   },
   "source": [
    "We can even write and read a dictionary that maps from a string to an tensors. This is convenient, for\n",
    "instance when we want to read or write all the weights in a model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "0gePUCmzm31U",
    "outputId": "6ab1add9-cb33-48c9-feec-c241e270cf68"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x': tensor([0., 1., 2., 3.]), 'y': tensor([0., 0., 0., 0.])}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mydict = {'x': x, 'y': y}\n",
    "torch.save( mydict,'mydict')\n",
    "mydict2 = torch.load('mydict')\n",
    "mydict2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "BFkBzZ8snFZH"
   },
   "source": [
    "# Torch Model Parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ONg_f6cnnANf"
   },
   "source": [
    "Saving individual weight vectors (or other  tensors) is useful but it gets very tedious if we want\n",
    "to save (and later load) an entire model. After all, we might have hundreds of parameter groups sprinkled\n",
    "throughout. Writing a script that collects all the terms and matches them to an architecture is quite some\n",
    "work. For this reason torch provides built-in functionality to load and save entire networks rather than\n",
    "just single weight vectors. An important detail to note is that this saves model parameters and not the\n",
    "entire model. I.e. if we have a 3 layer MLP we need to specify the architecture separately. The reason\n",
    "for this is that the models themselves can contain arbitrary code, hence they cannot be serialized quite so\n",
    "easily (there is a way to do this for compiled models - please refer to the torch documentation for the\n",
    "technical details on it). The result is that in order to reinstate a model we need to generate the architecture\n",
    "in code and then load the parameters from disk. The deferred initialization is quite advantageous here\n",
    "since we can simply define a model without the need to put actual values in place. Let’s start with our\n",
    "favorite MLP.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "UN7t8z1LnbeX"
   },
   "outputs": [],
   "source": [
    "class MLP(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.hidden = nn.Linear(20, 256)\n",
    "        self.output = nn.Linear(256, 10)\n",
    "        self.relu = nn.ReLU()\n",
    "    def forward(self, x):\n",
    "        H_1 = self.relu(self.hidden(x))\n",
    "        out = self.output(H_1)\n",
    "        return out\n",
    "\n",
    "net = MLP()\n",
    "x = torch.randn(size=(2, 20))\n",
    "y = net(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CcVEW3Mmu0yQ"
   },
   "source": [
    "Next, we store the parameters of the model as a file with the name mlp.params’"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "hXJ7Tfavtz6v"
   },
   "outputs": [],
   "source": [
    "torch.save(net.state_dict(), 'mlp.params')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FH2BcGybu-p-"
   },
   "source": [
    "To check whether we are able to recover the model we instantiate a clone of the original MLP model.\n",
    "Unlike the random initialization of model parameters, here we read the parameters stored in the file directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 100
    },
    "colab_type": "code",
    "id": "raqKzxuPu6hl",
    "outputId": "cfecbd94-9698-4e99-f306-25873ebdf55d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MLP(\n",
       "  (hidden): Linear(in_features=20, out_features=256, bias=True)\n",
       "  (output): Linear(in_features=256, out_features=10, bias=True)\n",
       "  (relu): ReLU()\n",
       ")"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clone = MLP()\n",
    "clone.load_state_dict(torch.load(\"mlp.params\"))\n",
    "clone.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9oOuYdZEvHv9"
   },
   "source": [
    "Since both instances have the same model parameters, the computation result of the same input x should\n",
    "be the same. Let’s verify this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 50
    },
    "colab_type": "code",
    "id": "n0ZofYiZvFZ6",
    "outputId": "76552f64-0ef1-47d6-e841-f6692209bca2"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n",
       "        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=torch.uint8)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yclone = clone(x)\n",
    "yclone == y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cYXaHhYvvNCW"
   },
   "source": [
    "# Summary\n",
    "\n",
    "• The save and load functions can be used to perform File I/O for tensors objects.\n",
    "\n",
    "• The load_parameters and save_parameters functions allow us to save entire sets of\n",
    "parameters for a network in torch.\n",
    "\n",
    "• Saving the architecture has to be done in code rather than in parameters.\n",
    "\n",
    "# Exercises\n",
    "\n",
    "1. Even if there is no need to deploy trained models to a different device, what are the practical benefits\n",
    "of storing model parameters?\n",
    "\n",
    "2. Assume that we want to reuse only parts of a network to be incorporated into a network of a different\n",
    "architecture. How would you go about using, say the first two layers from a previous network in a\n",
    "new network.\n",
    "\n",
    "3. How would you go about saving network architecture and parameters? What restrictions would\n",
    "you impose on the architecture?"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "File I/O.ipynb",
   "provenance": [],
   "toc_visible": true,
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
